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> This article cancerjvt recommendations made in a report to Congress an'dthe Department 
'\of Education on evaluation of federally supported education programs. THework f overs 
local, state, and federal efforts to address Questions about why and how well evaluations 
are done, and about how results'are used, the recommendations are directed toward 
'improving the quality of evaluations and enhancing their usefulness.. 
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v Ojf n June 1980, a grtfup at Northwestern University produced a 
A report for Congress on the evaluation of federally'supported 
education programs at the national, state, and local levels. This article 
addresses, only one aspect of the report.— recommendations and the 
'rationale for them.' * \ # 

The report.was undertaken i(i response* to the Education Amend- 
ments of 1978 (Public Law 95-561). The relevant section of the law was 
introduced as t .a bill by Congresswoman Elizabeth Holtzman of New 
Yorkf and it requires that the Secretary of Education* conduct a 
comprehensive study; of evaluation practices and procedures. The 
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questions' covered in the research are those impliecKby the law and the 
conference reports preceding it; Why and how are evaluatibnfc'carried 
out? \Vhat are the capabilities of those who carry dut evaluations? How 
are thv ocsults of evaluation iised? What recommendations can' be made 
to improve procedure or practice^ 1 j 

These questions discussed with congressional staff and federal 
agency personnel to clarify them. The more detailed questions are 
elaborated'in the body of the 'report. The Holtzman Project^was 
prospective in its orientation, designed to provide evidence, and 
argument bearing on these questions and to provide recommendations 
Ithat will ameliorate the*problernyfte identified. The project staff relied 
on two- broad sources of information: contemporary investigations by 
other, researchers and agencies, and direct field work. The latter 
included site. visits,to eight state education agdncies(SE As) and fourteen 
local education agencies (LEAs), and telephone surveys of approxi- 
mately 200 LEA^ The site visits and the larger- survey were based on a 
stratified random, sample. Round-table discussions at Northwestern 
were undertakenHo capitalize on experts in special topics, such as school 
board use of evaluation reptorts. Interviews with some-staff members of 
all major federal agencies with an interest in educational evaluation 
\\J.S. General Accou nting, Office— G AO, Congressional Budget Office 
--CBO, Congressional Research Service— CRS, and the execu- 
tive operating units) were carried out. The literature review covered 
both unpublished and published documents, including reports main- 
tained by ERIC and the LEX45 system. An earlier article, published in 
Review of Educational ', Research, served as a guide to reports issued 
before 1979. 2 " 1 * 

Re commendations to Congress are discussed first, and r ecommenda* 
tions to the DepiiHjtaent of .Education (hereafter called "the Departs 
mcnt") next. The rationale, qualifications, and limits to the recommen- 
dations follow. 
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RECOMMENDATIONS TO THE CONGRESS 
PLANNING AND EXECUTING EVALUATIONS. „ 

i * ' i 

W£ recommend that Congress direct the relevant staff of congres- 
sional committees, the G AO, and the CBO to meet regularly with 
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evaluation staff of the Department to (a) reach agreement about when 
particular evaluations are warranted, and the senses in which each 
evaluation required by law is possible; (b) t clarify congressional 
information needs, quality of evidence required, and planning cycle for \ 
each major evaluation required by law; (c) identify specificcommittees 
and groups as audiences for evaluation results; (d) identify the changes 
in program or understanding that could result from alternative findings. 

STATUTORY PROVISIONS FOR EVALUATION . <i 

We recommend that Congress, in constructing statutory provisions 
for evaluation, (a) specify exactly which questions oughrto be addressed 
and the audiences to whom results should be addressed, when specifica- 
tion is feasible:, (b) provide for formal assessment of the evaluability of 
the relevant program when specification is not possible; (c) provide for 
statistically valid field testing, of proposed evaluation, requirement, 
when specification is not possible and in-house assessment 'is insuffi- 
cient. - - 

KVALUATOR CAPABILITIES 

" We recommend tfiat (a) capabilities be assessed before new statutory 
evaluation requirements are directed to LEAs and SEAs to determine 
where resources are adequate to meet the demand; (b) training or 
technical assistance be expanded when the demands are notable and 
capabilities are low; (c) the feasibility and t desirability of direct contract 
programs be explored to capitalize on LEA and SEA capabilities. 

USE OK AND AUTHORITY FOR » ^ % 

BETTER EVALUATION DESIGNS 

We recommend that Congress (a) routinely consider pilot testing 
every major new program, major variations on existing programs, and 
major program components 4 before they are adopted at the national 
level; using high quality evaluation designs; (b) authorize the Secretary 
of Education explicitly in each evaluation statute to use'high quality 
\lesigns {especially randomized field experiments) for planning and 
evaluating, new program components, program variations, and new 
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programs, when estimates 6t program effect are desirable. We recognize 
that such estimates ^re not always appropriate or feasible. 

CRITIQUE AND REANALYSIS OF EVALUATION RESULTS 

We recommend that Congress, in statutory requirements for evalua- 
tion of major programs, (a) also require independept, balanced, and 
competent critique of evaluation^ results that are material to policy 
^decisions; (b) require critftjue of samples of ^valuations submitted by 
LEAs ar\d SEAs in response to legal requirements; (c) require that' 
statistical data produced by national evaluations be made available 
routinely for reanalysis.* * s ^ J » 

■ ' S ■ d * ' * ■ 

USE OF EVALUATION RESULTS A " ' * 

" 't ■ K \ : • * ■ > 

We recommend that Congress (a) direct staffofrelevant committees, 
the Department, and the.GAO to routinely outline whichlnslitxltions 
can reasonably be. expedited to use results of each major evalua^on arid 
how such results might be used, during the design stage of everyWajot 
program evaluation; (b) specify exactly which evaluations Have ^een 
used, and why they were used, and which have not been used ajnd why.' 
they were not used, in authorizations and appropriations Committee 
reports; (c) require specific information about changes resulting from 
evaluation, whenever the law requires SEAs to describe uses of 
•evaluation; (d) explore the feasibility of direct competitive grants and 
contracts programs focused on improving the use of results at the LEA 
and SEA levels. ' ' - , * 

STANDARDS AND GUIDELINES - w 

Recently developed standards and guidelines for evaluation are not 
> appropriate* for incorporation into law. They are sufficiently* tyell 
developed to recommend that Congress (a) use sUch guidelines to 
understand vfhat can reasonably be expected qf evaluations; (b) direct 
that state and federal agencies use them as a guide where appropriate to 
developing criteria for judging evaluation plans submitted by LEASand 
SEAs; (c) elicit assistance in the interpretation of guidelines from 
congressional support agencies ^-suc h as GAP— that oversee the execu- 
rtioh>of policy, law, and regulations. 
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■-if- RECOMI^ENDATIONSTO THE * * 

•\C *" "department of education < 

^AUTHORITY FOR TECHNICAL DISCUSSION w ^ 

. We recommend tfyat the' Department authorize technical stiff of a 
evaluation uhits^iniliat'e <|iscussion of evaluation plans with pertinent 
congressional sttjff, at their discretion, and refrain from directives tHat** 
impede direct discussion.^ \ » - 

PLANNING AND EXECUTING VALUATIONS « 

We recommend that the Department direct principal evaluation urn: 
staff-toWet reguJa^rly with relevanUtaff of Committees tct(a) negotiate 
. agreem|nt abbirt ( when particular evaluations are warranted and (he 
^ sens^sJn-Avhich each- evaluation required, by law is pqssible; (b) clarify 
"^cq^grtssional . rnfdrmation heeds, quality^oT^idence" required, and;,' 
..^planning cycle for ea^h major . evaluatiorTiincfertaken by the Depart-' 
mi|nt;1[c)* identify specific audiences^Jr groups for evaluation results; (dj 
. identify the Changes itf programmer understanding that could oc^iir on 
•the basis of evalu'ationi results. * * »- * ■ 

=v " ■ ■:■ . ^! . ■ . • - 

/tests of^New program components, , * 

PROGRAM*VARlATt(^iS, AND NEW PROG RAMS — '"■ ' . , . 

' • ■ - v ■ -.0 .A ■ ■ 

we recommend that the pepajtrnent authorize explicitly the use of 
high quality evaluation -dyesigns '(espetii&ijjy randomized experiments . 
when the interests. lie in estirf^ . / 

new pi^gra^n.cbmpotiehts; progranttvariations; at^l new programs. The 
authorization should be focorporuted into all regulations that require 
vestimating Jhe effects of -innovative changes. - 

• - r ^ ■ /. N ft J a ■ . ■ 9 . " „ . * 

CRITIQUE AND SECONDARY ANALYSIS <i ^ 

OK EVALUATION RESULTS / * Q - 

\ We~recoi£mend ttiatthe Department (a)provide for the independent^ . 
balanced, and competent critique of every- rnajoreyuluation funded by 
N the- Department in procurement of evaluations apd evaluation policy; 
; (b) incorpotat e into' procuremet ^pjroced^s_ and p6licV the require- 

lhent that all statistical data produced in major program evaluations be 

f» ^ *■ ;\. 

' 0 y ■ ; * ■ Jr. 4 ■ ■'■ ' * * ■ " - ': 
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documented and stored for secondary analysis; (c) create an administra- 
tive mechanism for deciding wnen simultaneous .analysis by v both the 
original evafuator and an independent analyst is desirable and feasible, 
and create* a mechanism for executing simultaneous independent 
analyses. ■ ■ ■ . 

ACCpSS, TO AND SPECIFICATION, OF REPORTS * •« ' 

We recommend tliat the Department adopt a policy to (a) adhere to 
e a ^clearance rule which makes evaluation reports available* after a 
spec?fied period of time; (b) specify completely the evaluation docu- 
ments refehfd to in tfte DepartmentV/lwiua/ Evaluation Report, the 
Federal Register, and policy statements; (c) include in every major evalu- 
m ation report a list of core recipients of the report, or compile publicly 
available lists of core recipients. a - * *n 

* * .' 

THE USE OF*EVALUATlON RESULTS * ^ 

We recommend that the Department direct^evaluation unit staff or 
evaluation contractors to (a) on a regular basis, provide oral reports, as 
well as ^written reports, on results of major ^valuations and on the-uses to' 
* 'whijch results can be put, to relevant congressional staff and support 
agency staff andto the program staff within the DepaYtment;<(b) created 
system to periodically collect, synthesize, and report specific uses to 
which evaluations :are put; (c) improve the A/tnual Evaluation Report 
• by citinginstances of u$e more specifically; (d) 'direct evaluation staff to 
% meet regularly with congressional staff to clarify infobm&ion needs, 
feasibility of evaluation, audiences for results, and ways in which results 
can be used to modify programs. ' - 

IMPLEMENTATION, 

We recommend that the o Department (a) routinely require^forrnal 
measurement of the degree to which program plans match actilal 
operations; (b) adjoin research on methods of measuring implementa- 
tion to the introduction of .new programs -and program variations; (c) 
create an inexpensive central information system on the time and 
Vesources required for full, implementation of new programs. 



number 
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RATIONALE ^ > \ 

These, recommendations are based partly on the project's findings 
and on our judgments about what rfeeds to be done to improve 
evaluation practice/The latter— like all such judgments — is based on 
barely verifiable experience. We sought attjjje a,nd criticism from some 
congressional and agency staff, members of the National Academy of 
Sciences Committee on Program Evaluation^ and LEA and SEA staffs. 
We have, capitalized on the professional forurns identified earlier. Time, 
however, did not permit systematic critique. We are^ making as much 
information as possible available for competing analysis. 

PLANNING VnD EXECUTING EVALUATIONS 

'! ' ' . • 

The legislative decision to evaluate is complicated by the large 
^er of potential participants; namely, the Congress, the Depart- 
of Education's' Office of Evaluation, theCBO, and theGAO. The 
*time available to make the decision* and to frame specific evaluation 
questions Is variable and-often appears to be insufficient. The advice of 
exports is only sontetime»available. The.process often leaves ambiguous 
the type of evaluation that is wanted, the audiences for the evaluation, 
'the probable uses of* the evaluation results^ and the^reasons an 
evaluation is w_anted. The ambiguity leads to unnecessary squabbles and 
misdirected or delated efforts. 

Elements for a More Orderly Process - 

«* i* * ^ 

The actions that appear to* be necessary to improve matters include 
(a) tegular meetings among evaluation staff of the Department and the 
pertinent congressional committees; (b) a planning system that matches 
'evaluations to authorization cycles^ (c) information systems that make/ 
access to previous worfc simpler and faster, and (d) identification of 
°» groups that can contribute to technical quality of the effort. ' 

* * ' 

RcmithorizationXyclc* ~ ► 

There has been a recent effort to match the prpduction of evaluations 
to the reauthorization cycle, and we understandfrom memos and recent 
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• •• . : > . . . , " v 

activity of the office of Assistant Secretary for Management that the 
effort will be sustained. It is imperative to do so if either management of 
Congress expects evaluations to be used in reauthorization decisions. 

Meetings ° -V ^ 

There is ho system of regular meetings among evaluation staffof the 
agency and the pertinent congressional committees in order to examine 
»^he senses in which a program can be evaluated. We believe that such 
meetingsare essential to assuring.that formal legislative demands for 
evaluation are as well informed as possible, and that the Department's " 
evaluation unit is equipped to handle them. Ideally, such meetings'- 
should be held before the law requiring evaluation is enacted. If that i^* 
not possible, formal evaluability assessments should be undertaken as 
soon as possible after enactment. / . 

. Those meetings should focus on the information needs of Congress 
and the* DcpartmeTTt —-notably on the questions thai should be ad- *' 
dressed in the evaluation. They should^also be a.vehicle for clarifyingthe 
reasons for asking the questions and identifying t*he audiences to whom 
answers ought to be addressed. \ «• 

Apart from this^ meetings might address chronic problems. Because 
different types of evaluation demand different ^resources,, some agree- 
ment on splieru issues needs to be made explicit, at le.ast occasionally. The 
factors that influence the feasibility of evaluations — new versus old 
programs, for example— ought*to be presented emphatically. Because/ 
every major evaluation must be tailored, the flexibility available, what is 
known and what is not known about the program, and soonoughttobe ~ 
made reasonably clear. v / , 

We do not mean to imply that^ lockstep series oftdiscusfcions among 
all relevant staff is warranted or possible. The point is that the absence of 
regular meetings on congressional needs virtually guarantees that some 
needs will not be met. Tha;in Uirn invites buclc-passingand evaluations 
of lower utility. „ 

Relevant Groups - 

The groups that shpuld be involved in the process in&ud.e^ evaluation 
Staff from the Department of Education's Office of Planning and Evalur 
ation Services and from the pcrti^erircongressional committees. U is 
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Sensible to capitalize routinely on . support agencies, subh as the GAO*** 
» Institute for Progjram Evaluation and relevant divisions of the GBO and 
the CRS, ■ ' H "' * * . ^ , 



Interest Groups and th^ir Role T z 1 

- . Interest groups that draft bills whicfrcreate or modify programs should 
*^e urged to also provide.plansforevaluationof the program i* the event 

i that their' suggestions* &re enacted as law. These plans should -'be 
routinely reviewed by the Department's evaluktioiftinit, at least; at best,- 
\he revie'w- should include congressional staff^rograni staff, and 

^independent critics ^/;\ ^ * 

Impediments * ■ 

i_ Thfere are impediments to any meetings .of the sort proposed. Some 
agency staff members, for instance, haVe maintained that they have not . 
been, free to ihitiale conversations that would clarify the intent of a 
demanorto evaluate, due to executive policy that restricts discussion. 
'She restrictions are said toliave a variety of legitimate origiris, including 
preventing agency staff- ft^m lobbying directly ^nd independently for * 
pet programs, and, to assure that there is at lefist some orderliness in' 
dealing with the Congress. For evaluatidn b§ units with authority to 
evaluate, however t we believe that such restrictions are inappropriate. 
; No evaiuator can cor scfentiously address a ques'tion p'osed toy Congress' 
• if the question cannot be discussed directly: We believe^hat agency 
policy mus* Tectfgoifce the relntiv^independence and distf^tion of 

t evaluation units. : " * - » * 



co 



^Impediments on] the congressional-staff side appear to include 
mrnit tfee staffers who^will partici£ate.in no discussion'unlcss,directed 
N by; a cominitjM^'luii^ 

The* more general problem' is/we were told, time and the difficulty oS 
coordinating meetings so as to be reasonably convenient to both' agency 
staff and congressfonatataff. * • , * * 



STATUTORY PROVISIONS FOR EVALUATION 



though some statutes are specific about program repbrting, rCfer- 
enceslo evaluation in many statutes are very general. 1*tie simple require- 
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ment to evaluate the program' or, to evaluate the effectiveness of the 
program^in meeting the objectives' of the statute occurs frequently. 

There is, however, great variety in the way individuals at the local, 
state, and federal levels of government interpret the word "evaluate" in 
law pr^elsewhere. It concerns the array of questions that might be 
addressed in an evaluation, in the approaches one might alioose to 
nswer them, 'and in the level of detail with which they might be 
answered. v f * 

< More specific statement of the questions that need to be addressed 
can help to reduce confusion and ambiguity in what is intended by law, 
and it can facilitate understanding of the scope and probable costs and 
benefits of the information. 



\ 



Specification \ 

*. • . ■ 

If Congress nee^l to know how many are served and how many are in 
need, What are services and their costs, what are # the effects of programs 
on their primary or secondary clients, and what are the costs and 

• benefits of alternatives, then Congress should request that information v 
explicitly. The same discipline ought\^> be asked of interest groups, 
advisors, and others whq draft evaluation language for prograrns.it can 
be requested that the qff&tiohs forgauging effectiveness be specified 
along with other features ofjJie program. That it ^possible to be more 
specific is»clear from the'statute mandating. the National Institute of 

Education (N1E) CornpensatoryEducation Study. That such specifica- 
tion is not always sufficient is clear from the satae study: Six months ofr. 
discussion were needed after enactment to clamvvevaluatian goals. . 
* It will not be possible or desirable to be explicit Ui every case. To 
assure that general demands for evaluation are not rnisinterpieted„ the 
law' should provide for a formal assessment of th<f senses in which^he 

j program can be evaluated within one year after the enactment of the>> 
legislation. * - ^ 

Need for, Dialogue ' 

■ • / 4 *\ 

Regardless of how Well legal requirements can be specified, thareis a 
persistent need for regular dialogue between agency staff and congres- 
sional staff in refining questions and developing -agreements on*5ba(f 
level of quality of evidence is warranted, and at what cb$t . The dialogue 
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has occasionally been encouraged in congressional committee reports, 
by some congressional staff and by some agency staff. But it is irregular 
■ and more Heavily dependent on individual preferences thdn it shotrld bp* 

/Audiences * „. • 

Because evaluation* 're^ts may be directed to any number of 8 
audiences*— Congress, Department "-management, inter^sj groups, ad- 
visory committees, and so on— there is a clear need for focus. The more 
audiences there are, the more difficult evaluations become. 

i y ■ ■ -■ . "| ' v . ' y • * 

PilofT ests of Evaluation Demands 

When there, is substantial disagreement abQUt which questions should 
be^addresscd and about how the information might be used, pilot 
evaluations should Jbe undertaken. That is, one mounts formal small- * 
'scale experiments to determine which of several different evaluation ' 
schemes work best. They can x be^ put into the field (a) to measure the 
paperwork Burden on respondents, (b) to determine the costs qf 
collecting the;informatioi^(c) to determine the quality and usefulness of 
the information, and (d) to clarify language that can be used in statute 
an^ regulation. / 



'USE AND AUTHORITY FOR BETTER EVALUATION DESIGNS 

The authority to use better dcsign/(e$pecially randomized experi- 
ments) in the interests of (ess equivocal evaluations of ne^w programs, 
* major new. program variations; and major new program components 
must be made explicit in law and regulation. This recommendation 
applies only when there is substantial interest in estimating the effects of 
a program on its primary target jyoup. 

By "randomized experiment," we mean assigning children, schools, " 
or classrooms randomly to one of two or more program variations, for 
instance, and then observing their performance under each regimen. 
The random assignment guarantees that, in the long run, comparison of 
the variations will be fair. This is one of the reasons the design has been 
used in the Negative Incom^Tax Experiments, in the Manhattan Bail 
Bond experiments, in evaluation of television programs such as Sesame 

■ ■ • \\. "-. \ ■ '. .: 
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Street, The Electric Company and Free Style, as well as in the evaluation 
of the^ effectiveness of medical treatments. 

• The rationale for the first part of the recommendation— pilot testing 
new programs— is that higher quality; evaluations are more feasible 
'before the program is adopted at the national level. Better evaluation 
designs can be employed*, conclusions are less likely to be ambiguous, 
and political-institutional constraints are less likely to be severe. The 
introduction of new programs can be s'.aged so that earlier stages 
constitute pilot tests for the later ones. This may seem terribly mundane 
to some readersTButTecoghize that in recent political discussion of the" 
proposed Youth Incentives Program— an enterprise whose costs may 
exceed $850 million per'year~-therc had been no formal attention to 
p'ilot4esting*or staged introduction before 1979. Title I compensatory 
education programs evolved in the same way ten years ago, and we still 
know* pathetically little about effective variations. Trie simple notion 
that massive new programs'ought to bb , pilot tested is still warranted. 

The second part of the recommendation, concerning higher quality 
evaluation designs, is based on the presumption that we won't learn how 
to bring about detectable changes .in the performance of children or' 
schools without more conscientiously designed tests. The justification 4 
forihe recomm^dation lies partly in the poor-quality of designs used in 
thefield. It is discoura'gingly easy to find, for example, .legislative 
'teRtirnqny in t wtfich a Title I program is declared to be.a success by an a 
individual because "test scores went up." W« do not advocate attempt- 
ing to estimate program effects in all cases. The process of estimating 
Effects is complicated under the best of conditions. We advocate 
attention to high quality designs, especially randomized experiments. 

At the^local level; there are some evaluatprs with the interest and the 
skill to employ the design for the sake of Hair tests. An obstacle, we 
believe, is confusion about autlferity for funning such tests. So, for 
instance, an evaluator offered the opinion that the design is desirable, of 
course, buf"in the absence of a clear statutory\nandate. that evaluator 
could not risk employing it. The failure of feder^afoprogram managers to, 
encourage randomized experiments at the local level is partly because 
the mandate to do sj> is not explicit.^At the federal level, some authority 
already exists. Indeed, evaluations at that le^vel (such as the ork 
conducted for the Emergency School Assistance Act) have employed 
state-of-the-art experimental designs.- But ws believe that the authority : 
must be' made rnqre explicit. .' .* \ 4 
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Feasibility 

The usefulness of randomized tests, in principle, is not usually at issue 
in discussions about estimating effects of new programs. Argument 
about the uses of the design concents the idea that randomized 
. experiments are rarely feasible in field settings. Rarity does not establish 
lack of feasibility, and in any case a notable number of fielci tests have 
been mounted. (This is imperfect evidence, in that it doesn't guarantee 
that an experiment can be mounted successfully in the situation at 
hand.) 

Pilot tests of experiments can yield more direct evidence on the 
feasibility of randomized experiments or other high quality designs. 
Consequently, we recommend^mounting a small assessment prior to the 
c major field experiments to identify problems in the field and to resolve 
them. Randomized experiments fail to be successfully implemented in 
education as^in medicine, economics, and other fields, because the 
randomization is incomplete, because the programs are not imple- 
mented as advertised; and for other reasons. Pilot tests of the 
experiment itself can help to avoid unnecessary flaws in implementa- 
tion. - \ 
/ Lacking dependable precedent and the opportunity for adequate 
' pilot tesfs of the evaluation design, two general criteria forjudging 
feasibility of randomized experiments are sensible. The first criterion 
hinges on the, fundamental" notion of equity. When there is an 
oversupply of eligible recipients . for a scarce resource — program 
services — then randomized assignmentpf children tta the resource seems 
fair. So, for instance/ Vancouver's Crisis Intervention Program for 
youthful status offenders affords equal opportunity to eligible recipi- 
V ents. Srnce not all could be accommodated, and all are equal ^eligible; 
\ they are randomly assigned. Some experts argue that randomized 
experiments are most likely to be carried out successfully when the 
boon— real or imagined — is in short supply, and the demand for the 
boon is high. This rationale dovetails neatly with nofrnal managerial 
constraints. That is, new programs cannot be ernplaced all at once and 
• ° not all eligible candidates can be served at once. Experiments can 
* then be designed to capitalize on staged introduction of programs or 
services. „ ^ 

A second criterion concerns .settings in which it is politically 
unacceptable to assign individuals randomly to control conditions, 
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despite the fact that we know nothing a^bout whether a program works 
relative to» no program at all. The ethical, moral, and economic 
justifications for experimenting may be quite irrelevant. In such cases, it 
is often possible to ameliorate difficulties by comparing program 
variations against one another, rather than comparing a novel program \ 
to an. existing one or to no program at all. A "no-program" control 
condition may bean-unacceptable political option, whether or not the 
^program fails. The most we can reasonably expect then is to choose the 
invented variation component that works best for the investment. 

The idea of testing variations or components, rather than testing a 
program against a control, is a compromise. But we believe that IfMs 
better than getting no information at all on the effects of the' program in 
question. In particular, /or ongoing programs that have strong public 
support, it seems sensible to think in terms of randomized assignment to* 
new program variations or to new program components to discover 
more effective or cheaper versions of the program. 

Trie most direct action that Congress, can take to ameliorate the 
problem involves any statute that asks that the effects on children of a 
new program variation, or new components be estimated. We recommend 
that.such statutes include an explicit.provision authorizing statistically 
valid randomized experiments. For existing programs. We believe that 
some explicit authority is necessary to foster fair tests. That is, the 
Secretary of Education should be empowered to waive compliance with 
technical aspects of statutes or regulations- for experimental projects 
thai are likely to assistircpromoting the statutory objectives. This would 
facilitate, for instance, randomized tests of cheaper variations on Title 1 * 
programs, student loan programs, and the like.* j 




INDEPENDENT CRITIQUE AND SECONDARY ANALYSIS 

We recommend to Congress and ^e Department that major program 
evaluations be subjected routinely to competent, independent critique 

"and secondary analysis. Mechanisms should be created to permit 
routine critique of a sample of evaluations produced at the LEA and ' 

A SEA levels. By "critique" we do not mean adverse commentary. We do . 
mean, balanced examination of the quality of the report and judgments 
about whether recommendations can be sustained by the evidence, 
"Secondary analysis'* here refers to analysis of raw statistical data, 
undertaken to improve on the quality of earlier analyses.- 

• \ 
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The origins of I his recommendation lie partly in the idea that we 
should recognize good-quality evidence as such, and to properly identify 
poor evidence. There is also some need to.prevcnt the ingenuous use of 
poor evidence and to avoid relying unnecessarily-on one's confidence in 
the single evaluation. Furthermore, major evaluations are expensive. It 
seems sensible to allow the community of policy makers or ftieir 
advisors to make the data work repeatedly, at low cost, ifc secondary 
analysis. Because evaluations may affect a variety of interest groups, 
those groups should be given an opportunity to offer competent 
criticism. Finally, we/ believe that the absence of independent criticism 
can reduce the importance of good evaluations. 

' ' *"S - 

Elements of a System for Critique and ' » 
Secondary Analysis: National Level 

The elements of an effective system for critique and secondary 
analysis include (a) explicit institutional policy on rapid disclosure of 
reports and access to statistical data underlying the, reports; (b) a formal 
mechanism for independent critique or secondary analysis, when 
possible, during an-evaluation; (c) a formal administrative mechanism 
for independent critique and secondary analysis when evaluation results 
are submitted; and (dj formal guidelines on reporting and storage of 
statistical information. c 

Elements of policy on reanalysis have already received some 
attention. For instance, the GAO has, in its guidelines on impact 
evaluation, taken the position that access to evaluative data for 
reanalysis is generally an important consideration. The Lfepartment of. 
Education has not had a formal policy on disclosure of statistical data. 
However, the Department's Office of Planning and Evaluation 
has had an unwritten policy and has released data periodically for 
independent review and secondary analysis. Informal critiques of data 
sets have been undertaken by the CBO as a part of its Efforts to screen 
studies for quality. These activities are undertaken so asyto recognize 
individual privacy needs. Making policy formal, creating the adminis- 
trative mechanisms, and testing them are sensible next steps. 

Rapid a^ess to evaluation reports has been a problem. Clearance of 
evaluation reports by the Secretary of Education, according to federal 
staff members who were interviewed, was slow at best. (We understand 
that the Department has recently adopted the 10-day clearance rule, 
which should improve matters.) 
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Informed Criticism ■■ - / 

Opinions about the desirability of early independent review of major 
evaluations and of secondary analysis are not uniform. At least some 
agency staff reckon that a routine process will generate more heat than 
light. Assuring competent criticism in this arena-is likely to be as difficult 
as it is in medicine, economics, and other fields. Some of that criticism is 
bound to bespecious, dull witted, and self-interested. High quality in the 
design and execution of evaluations offers some protection against 
unwarranted criticism; but it is unlikely to be sufficient. Outcome 
evaluations are always subject to criticism, especially if the .program 
does not work. We believe, however, that openness to criticisms mus*t be 
given priprity, and that some administrative research on reducing 
mindless criticism should be undertaken • 

State and Local Level 

A good many local and state evaluations provide no more than 
counts of those served, changes in test scores, and similar information. 
Regular, systemic reanalyses of the raw data underlying all of theft is 
not warranted. It is more clear that samples of reports ought to be 
reviewed and criticized periodically. The main purpose of independent, 
competent criticism is to assure that the quality of evidence used to 
inform decisions is recognized. We also expect that this sort of critique 
will help to improve quality of the exercise, in the long run. ' 

There are a variety of institutional vehicles available tp conduct 
reviews. States with fairly well-developed evaluation units are a natural 
option.' California and Michigan, for instance, have review, validation, 
and dissemination systems to try to assure that information about good 
programs of all kinds is available to LEAs. Such units may not, however, 
be independent of program Offices. Moreover, some field investigation 
is warranted to determine" whether W not evaluation capabilities are 
sufficient to generate high-quality critique. 

The federal Joint Dissemination Review Panel (JDRPj is also a 
vehicle for critique ^samples of evaluations. Its role is now limited to 
examining evidence volunteerelrty LEAs and other agencies that 
believe they are strong enough to sustain frank criticism, and "so its 
mission would have to be expanded. The number of reviewers available 
on JDRP is iiot sufficient to review even a srnaHpdditional sample, and 
sp it would have to be enlarged. * 
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Technical Assistance Centers (TACs) supported under Title 1 
constitute another option. But their role is confined to providing advice 
only when asked about Title 1 programs. It is not clear that TACs can be 
regarded a* independent reviewers simply because, they may provide 
advice on evaluations in the first instance. 

The problems of assuring decent review and reanalysis of evaluation 
reports is sufficiently important to warrant further examination by 
federal agency management. That examination should address (a) 
alternative plans and administrative vehicles for critique and reanalysis, 
(b) alternative sample designs and tirne'frame, and (c) design of pilot 
tests for review so as to estimate costs and benefits.of a system before it is 
emplaced, to 'determine if the effort is indeedJUstifled. 

ACCESS TO REPORTS <% ' ' • ^ 

Effective mechanisms to assure early release of evaluation reports 
Ta" n<i "ready access to reports ought to be createc^ The origins, of this 
recommendationlie partly in the idea that evaluation reports offered as 
a basis'for policy, major executive decisions, and oversight should be 
open to competent criticism' and should be accessible to a wide variety of 
potential users. The recommendation stems partly from the difficulty 
encountered in obtaining reports at the federal level, thoughat this level 
is farjess difficult than it is at-other levels of government. 

Rapid acAss 1 to reports prior to 1980 was impeded by clearance 
processes^ithin the education division of the Department. That is, 
reports* issued by a contractor have been reviewed by the Executive 
Secretariat of the Department before release, and those reviews have 
resulted in delays in release without notable improvement in the 
documents themselves. : 

The inclusion of a clause in Department contracts — Article 28— 
requiring .that permission be sought prior to even discussing an 
evaluation, iimore invidious. It prevents some universities from bidding 
on evaluations, since the clause runs counter to university standards of 
intellective independence. It is possible that this proviso reduces the 
quality of reports by impeding disc mission < projects in professional 
forunv The mechanical difficulties of" identifying and obtaining- a 
report or a cluster of repo/ts bearing on a specific evaluation are very 
tedious. 
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The problem of assuring rapid access has been rectified at least in the • 
sense that the Office of the Depu: Assistant Sccrcta^for Evaluation * 
and Management has established a new clearance process. Reports are < 
to be released iuitomatically after 10 .days if the siasretary-level review 
has no modifications. The memorandum also permits adjoining criti- 
cism to the released document by program managers. We believe that 
automatic clearance after a specified period is desirable and that the 
practice o.ught to be maintained. 4 

* .The practice of requiring^contractors to. seek permission for discuss- 
ing results o in public forums has net been change^, as far'as we know. 
Our recommendation is. that no . such requirement be imposed in 
contracts. 



distribution of information 4 ' 

t 

We suggest the creation of a department-wide periodical that 
.dentines and abstracts each evaluation report submitted, to the 
Department and submitted by the Secretary of Education to Congress. 
We expect this to ameliorate access problems inside a|fd outside the 
government. At its best, such a periodical will keep the public, Cong)sss, 
and staff of the Department abreast of what has,been produced atjd.* 
perhaps even of why it was produced. Models for this include theGACJs 
Monthly Reports, which summarizes reports issued by the agency. 



Responsibility for Distribution ' ' m -\* 

The practice of assigning sole responsibility to the project officer'for 
final reports is not entirely effective. Officers vary in their attention to 
circulating reports'and submitting them to distribution centers such as 
ERIC; they shift agencies, resign from government service, and 
otherwise disappear. So do reports, at leat^t times. Mechanisms must 
be developed to avoid reliance on the single officer. The options include 
(a) strengthening internal agelicy capability for storage of 'reports; (b) 
"assuring-that the list of core recipient^ for repprts is included in the re- 
ports themselves, or that such a, list is publicly available; (c) requiring 
the contractor and the agency to, maintain a list of reports (with full 
citations) generated, together With the location of the agency that dis- 
seminates it; (d) requiring that the recipient of each evaluation executed 
under contract or grant provide reports, abstracts o/ reports, or both, 
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after 10-day clearance to ERIC, the National T ethnical Information 
Service (NT1S). the pertinent education centers and laboratories, the 
Committee on Evaluation and Information Systems (CEIS), the Federal 
Education Data Acquisition Council (FEDAC), and congressional staff 
and support agencies (especially the.CRS and GAO); and (e) distribut- 
ing each report routinely to every federal evaluation project officer and 
every evaluation contractor* 9 

TRACKING THE USE OF EVALUATIONS 

Our attention to thife topic stems partly from the arguments about 
whether evaluations are used. The answer is easy: Some are and some 
are not. The more interesting questions concern how they are used, how* 
often they are used, how to balance their cost against use and how to 
encourage use. The last two questions cannot be answered adequately 
now. because the hard answers to the "how" and "how often" questions 
are fragmentary, and the soft answers are- rather too dependent on 
flawed memory and competing, interests. . i 

The problem of verifying use or nonuse hinges pahly'on turnover of 
staff responsible for initiating, conducting, and using evaluations. 
Cftrrobprating use of an evaluation through independent sources is 
difficult and sometimes impossible. Titfes of reports often imply nothings 
about potential or actual ^ge^and reports are misrememberedtor 
forgotten. Incomplete citation is a chronic problem. The follo\\£fig 
recommendations are mundane but critical for inexpensive tracking. At 
best, they will eliminate part of the burden placed on respondents 
in studies of the use of evaluations. 

* Better Specification in Reports and Regulations 

Congressional reports, .agency annual reports, regulations, and the 
like .are often imprudent or at least sloppy in failing to specify which 
evaluations they've used. Yet, such reports can be useful in.tf acking use 
of evaluations and in improving them. References to eyaluations should 
then include author, title of the report, data^of issuance, and sponsoring 
agency. If congressional or agency staff themselves cannot supply full I 
ireferenccs,lhen merely hiring an inexpensive, bright graduate student to 
build a specific reference list for each* report of the half-dozen or soft 
congressional committees most pertinent to educational evaluation^ 
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would suffice, so long as access to the list and dissemination of the list • 
were assured. " f K 

The recommendation applies to both congressional committee 
reports and to major agency documents such as the Department's' 
Annual Evaluation Report andcpolicy statements, CBO documentVare • 
somewhat more conscientious, and GAO documents, normally carry at 
* Jeast part of the information suggested. This, recommendation also 
5 appliesLto proposed andpfn^l regulations issued in the Federal Register, 
since evaluations do result 'in regulation changes but* are rarely 
recognized completely in the prose describing changes. An illustration 
of exemplary practice is the modification of regulations*on day care in 
'1980. . • 

The practice of recognizing evaluations explicitly whe*n they have 
been useful in deliberations of Congress and at the executive level, is 
admirable. Identifying what is useful fLides the agencies in tl\e long run \ 
(if not the short run)7 rewards those who*pyform well, and exhibits some 
integrity to an occasionally cynical audhmce. The practice of recogniz- 
ing .good evaluations (hat are used is not uniform, however. The 
sponsoring agency* is not given csedit, due to time and resource 
constraints. That problem is sfcrious enough to discourage some staff, if 
not to demoralize them.- -It would be helpful if more conscientious 
attention were<given to recognizing^iseful evaluations and to recogniz- 
ing useless evaluations in committee reports and the lilce. 

\ ' " < 

Improving the Department's /l nnual. Evaluation Report • 

The Annual Evaluation Report enumerates uses of evaluations com* 
pleted by the Department. It is important, and there are some ways tb make . 
it more useful. (I) The report on use should provide spfccifio citation of , 
each evaluation report, its author, title, date of issue, and issuing ageticy." f 
Otherwise, it's impossible for the reader to verify that a report has been * 
issued, much less that^t has been used. (2) The report on use should 
provide specific citation of hearings or congressional reports in which an < 
evaluation. report is mentioned or used, and specific Jittition of 
regulations that are said to have been changed on the basis of evaluation - 
results. It should cite regulations thai are proposed or created as a result 
of the evaluation. Otherwise, verifying claiftisvof use is difficult or* « 
impossible. (3) The contributors to the section oh use of evaluations 
should be acknowledged to permit verification and corroboration. (4) * 
The Annual^ Evaluation Report's perspective on us? ought to be 
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reexamined to identify flaws in indicators of use, Wch as citations of 
hearings„'and the possible biases in them. Ignoring agencies apart from 
Congress ma^es it likely that use of evaluation results is understated.* 
Vety littlejnformation on management uses, apart from regulations, is 
provided. (5) Evaluations for which it is difficult to find verifiable 
evidence of their use should be identified. Evaluations that are virtually 
useless two years after production shwuld be identified explicitly. The 
m issue that ought to be addressed in future examination of uses is, Would 
reporting other than annually make sense? 



f identifying the Recipients of Reports 

Major evaluation reports should include a list of the individual, 
Committees, and agencies to wrjom the report was sent. This will 
• facilitate trackirig*the use or -npnusepf reports and our understanding of 
misdirected 'effort. The practice of appending reader lists to reports is 
current at the Officp of Navai Research. The practice. appears to be 
feasible for at least major evaluation reptofts. Where enumeration' of 
members of the audience is not feasible, then the listjs used internally as a 
basis for distribution of reports ought to^be accessible. 



^ Tracking Management Change «, 

Very little systematic, publicly available evidence is available on the 
managerial uses of evaluation. Moreover, there is no general mechanism 
for regularly following up on whether^jproblerns identified in an 
evaluation have been or can be rectified. Follow-up does • occur 
episodically; through questions addressed to* managers at committee 
hearings for instance. But we have been unable to identify any special, 
' orderly/record^-keeping oh the matter. We recommend that a simple • 
examination.of alternative mechanisms be undertaken to determine if a 
cheap follow-up system can be develope^, and to determine how such 
t# mechanisms can be field tested. ^ * 

Locil and State ~ 

» • - 

; We have not investigated state 0 uses of evaluation's sufficiently to 
make recommendations on tracking mechanisms at that level. However, 
'two features of sorjje local and state efforts are worth considering by 
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both federal and state'agencies. Some states, such as Massachusetts and^ 
Michigan, require that in local reports to the state the various usages of 
evaluations be reported regularly. Those reports are, in principle, a 
vehicle for tracking use and; occasionally, synthesis. It is possible that a - 
few states have developed especially efficient ways to accomplish this * 
task. If so, the procedures pught to be made availabre to' ot her state and 
federal agencies. The alternativePto regular reporting is a special survey 
undertaken to obtain periodically a better picture of uses than one could 
obtain in reports. At least one state — California — has tried this qrKion, 
" and the results were informative, - . , r • . " 

STANDARDS AND GUIDELINES * . 

' - * t< 

Current, guidelines. can-*b£ exploited in Resigning evaluations and in 
making crude judgments about quality of an evaluation report; 
however, they are not equally relevant to all types of evaluation, ant^ 
they are not appropriate for inclusion in law or regulation. They should 
be recogriizfcd in °policy statements, internal guidelines, and other 
flexible Hireptives^ ;/ V 

Guidelines have been developed by the GAOj the Evaluation 
Research Society, and ttje independent Joint Committee on Standards 
Tfor Educational Program Evaluation. Standards arje embodied in" 
manuals used' by the federal Joint Dissemination Review Panel in 
assessing educational worth'of,new programs. 1 ** 

/There is substantial overlap in topical coverage of all of these. 
Moreover, the topical coverage, overlaps with Standards used in 
choosing designs for major national evaluations and grants for evalua- 
. tive work supported by N1E and the Department of Education. 
The guidelines are very 'general, as any set df^guidelines on 
completeness and quality of evidence must be, given the variety of forms 
that evaluation may take. It is sensible, for instance, to expect that an 
evaluation that purports to estimate a program's effects ort children 
s covers pertinent .topics: 'evaluation design, source and quality of 
information, competing explanations, and so on. These elements are 
• part of most good guidelines, fcut they are hasubstitute for {rainingancT 
judgment. ' £ 

The main justification for recommending that guidelines be recog- 
nized is that we^ believe they can be useful in clarifying what is meant by 
quality of evaluation and in* informing the public about what can 
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* generally be expected evaluation.' Guidelines so be of some • 

. assistance Mil protecting the competent evaluate jm gratuitous 
'criticism,, and in identifying inept evaluation." They can'be useful in 
reviewing proposalsonade by L£^s for programs that require special 
evaluation, swob as bilingual education. , 

■ ' "■ ' '■ : . 0 

National Level e - 

We recommend that guidelfnes b£ formally recognized as such by" 
agency executives *and by congressional committee^ staff; Thexahave 
already been recognized by evaluation„staTfers within the education 
, -agencies' and the -jGAOj"-. indeed, agency staffers Contributed to 
their development. By recognition we, mean formally acknowledging 
their existence*, assuring that pertinent staff *know about them, and 
testing fhe guidelines in ttie fieldr.Jt would not»le difficult to incorporate 
short- reviews of guidelines inta' training-prograrhs and seminars on 
evaluation run* by the tCRS, the GAO, "or the Federal Executive 
Institute. \ I 



SUtc and Local Level \ 

. - : \ V 

It is reasonable to assure, that §EAs arid LEAs know about tfie 
guidelines, to make guidelines available, and to cn«. 'age tests of 
guidelines at the local le- d.' .Guidelines can, for aa :e, be. cited 
in requests for proposals^RFPs) and grant material without demand- 
ing that they be followed.. They may be made available through special- 
purpose information clearinghouses, such. as ERIC, or through com* 
§ merical publishers, * • 

v It is.reasonable to encourage their use, but not to require it, in the 
interests of fostering better quality evaluations and protecting compe- 
tence. That encouragement can be givf/n through federal and state, 
agency offices that disburse funds for innovative programs. ' 

Responsibility for advising the public, administrators, school boards, 
and so on currently rests with evaluation staff & local and state levels. It. 
is not unreasonable to urge that they make guidelines available to these 
audiences for evaluation resuflts^The guidelines are pertinent, however, ' 
„ to t£e minority of LEAs, namely,/the ones .that do more* than simple 
v mpriitoring. . / ■ • ' j . ■ 
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Field" Tests • - . '/■ 

;We do not recommend incorporating-guidelines intoJla^ or ^gula^_ 
tion. Only some aspects of guidejines have been field Jested; and 
regard less of how reasonable they appear to be in principle, their costs 
and benefits need to be better established before. they are generally 
required. It is also sensible to determine their susceptibility to incompe*- * 
'tent interpretation/misinterpretation, anil corruption. Finally, guide- 
-lines will change a bit as the state of the art in evaluation develops. 
* Formal. tests' may help to avoid a prematurely rigid posture-on Svhat 
constitutes quality. . ■ ^ A . 

Caveats . * 

Contemporary -guidelines cannot be simply applied to evaluation 
reports produced, by LEAs- in response to federal or state reporting 
requirements. In the firit place, repdrts diffe^appreciably in content, 
depending on audience. 'Reports made to Parent , Advisory'Committees' ^ 
in Title 1 programs^ for instance,, contain information that differs in , 
'depth and in kind 'from information provided to states. Second, federal/ 
'requirements are- minimal. Any review , of what is. produced to fill 
requirements is likely to be a useful target forguidelines, simply because 
reports are more useful at local and regional levels. \ 



ESTIM^tlNCtHE EFFECT OF PROGRAMS., • . ,\ - \ : 

Th9 general expectation that alt local, %tate,^and federal education \ 
agencies will produce clear evidence on the effects of programs should j 
be abandoned. The Emphasis should be placed on finding better 
* variations on progtams and effective prograiti compdnehts in LEAs and 
'SliAs that have the resources to plan fair field tests, and on well*, 
designed tests run by the federal government. 

Measuring growth- of v children in intellectual achievement, personal 
development, and other areas is often warranted. However, the practice 
of attributing growth to a program on the basis of these data alone is not. 
warranted, simply because there are so many competing explanations 
for growth or any change. Local and state evaluations rarely recognize 
competing explanations. 
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The -demajjd for information about how much a program affects 
children must recbgnize that clearly interpretable estimates of this 
depend on^ evaluation d esigns ijhat acco mmodate com peting explana- 
tions. Those designs are o not. always feasible in Ipcal settings. Technical 
assistance is no substitute for resources, for localinteiest'jn estimating 
effects, or for those designs. Moreover, estimating effects at the local 
l^vel dften has lower priority than providing services that children and 
their parents believe are effective. , - % 

The-demand for estimate's of effect on children induces a kind df 
benign^hypocrisy among some staffers^ administrators,* and, local 
contractors Responsible for programs and evaluations'. An increase in 
test scores is treated as evidence that.the program "works," for instance. 
The conscientious ^members of each camp will admit that other 
explanations— nprmal growth; for instance— are possible. They also ■ 
admit (and, we agree) that separating out the/influence of the prograin 
frofla otlier influences is not possible without a.great deal of managerial, 
legaM, and technical effoft, and it maybe im^Qssible despite those efforts. 
The admission appears rarely in evaluation imports on Title I programs, 
vocational education, and bilingual education. '' 

. Judging from our site visits,-sqme LEAs and SEAs are interested in 
testing .cheaper ..varieties, of programs, progra'm components, , and 
program variations, and some ofohese are capable of doing this vvejl. It 
is sensible to capitalize on that interest, if the evaluations of these are 
well designed. To the extent possible, contracts for doing so ought to be" 
made available. Funds have been available through Tifle.1 V*C and some 
N1E programs. They' can lead to better understanding of what works, 0 
what works inexpensively ♦.and to the dissemination of the programs to 
interested local agencies. The effor^rtiay have to be augmented wiUi 
assistance from universities, private contractors, technical assistance 
centers, or others. These are'naUsubstifutes for in-house staff and for 
strong^administrative suppiort of fair .tests from administrators arid 
oversight groups. ' ' 

The National interest in understanding effects of new programs, as 
wetl as the quality o/ delivery, needs to be recognized and reiterated. The 
conduct of pilot tests of new programs should be supported when 
feasible and appropriate. Ttys recommendation stems partly from the 
progress -made over the past ten years in mounting field tests of nfcw 
-programs** program variations, and program components. There have t 
been^mperfections and failures in these tests to be sure.The execution of 
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° good outcome studies is exceedingly difficult. These problems should 
not be regarded as excuses to avoid the yirtuVof understanding effects. 
The public interest in evidence of this kirid (in education or in other 
areas, such as medicine and economics). has aot been consistent. Planned 
tests arealways^atlneraJik-flri thi* account! as well as on accoun t of their 
youth. '■ * 

* The questions about how money is spent and to whom services are 
delivered (so-called process evaluation or implementation^tudies) are* 
also important. Judging by recent work, the emphasis on this informa- 

' tion has been understated. There must be some stress, however, on 
obtaining more than body counts, to supply more than nominal 
statements of where dollars go and who, receives -the services. The 
character of services is often poorly understood. Any such investigation 
wilhnot help one understand whether more notable effects are produced 
by certain services than by cheap competitors or by no service at all, 
of course. ■ , - t ■ 

ENCOURAGING INTEGRITY ■ y 

Evaluation often engenders concern among those whose program is 
evaluated. This, in turn, can provoke institutional pressure to find nice 
results if the evaluator is under the supervision of the'program manager. 
Consequently, maintaining integrity can be difficult. The following list 
of options was /developed to understand how one might facilitate 
integrity in the" face of such pressure at federal, state, and local levels. 
/ i . •» t 

Posture at the Policy, Management, ( <und Oversight 
Levels of Government ' 

There is some argument for the view that administrators of^newand 
innovative projects should not be judged solely on the basis of the 
outcome of the program 'for which the? .are responsible. Many 
educational projects are high-risk ventures, and theit failure is often, if 
not always, beyond the control of any individual or institution. It is 
important to understand why we fail- Program tnanagers and their 
staffs, then; should be judged at least partly on the' quality of evidence 
bearing on a program, regardless of whether one finds that the program, 
itself is a success. To be effective, that view would have to prevail af 
national, state, and local leyels. - 4 
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Design of Evaluations of New Programs ' 

It is sometimes possible to accommodate fear of evaluation tlirough 
design of evaluations. One of several simple ways of doi.ngso is not to 

. 'ev aluate prog r am A by comparing it to~no p rog ram at all. "N o program 

at all" is often not a politically viable option if A fails. Rather, one ought 
to compare variation A of the program to variation B, where each 
variation has identical objectivqs^but differences in cost, approach, ;or. 
other characteristics. The difficulty with thjk option is that we often lack 
the imagination or t resources to invent B; and of course, it provides no 
information on the effects^of A relatiy^to no program at all. 

k External Review " 

, 0 Orie way to assure that incompetent L evaluations' and competent 
evaluations are^ prcperly labeled as such is to subject completed 
evaluations to external review. This tactic is consistent with the aims of 

• the education agencies, the CfAO, and other agencies with an interest in 
quality and standards of evidence. It is consistent with the recent trend 
toward secondary analysis of program evaluation data, conducted by 
independent academic institutions. The latter option has been used, by, 
among others, the former U.S. Office of Educatidn, the NIE, the Law 
Enforcement Assistance Administration, and other agencies in the 
United States. A variant on the tactic has been tried by individual — {— - 
researchers in Pakistan in reviewing evaluations Sponsored by the 
government. This option cannot assure directly that evaluations done 
with integrity will be* rewarded. Gratuitous criticism is common. It 
should make it more likely, however, that poor eyalualions are 
recognized as such and are not rewarded. 

Joint Dissemination find Review Approaches 

Consider a review board that has clearly defined standards for 
examining the quality of evaluations, and which examines quality upon 
request from the program manager. A main objective of this panel or 
board is to officially verify the quality of the evidence and to declare that 
the program (if effective) deserves to be disseminated. Further, such a 
* seal of approval can become a device for obtaining more money for 
- similar projects from an agency. Both official recognition and the 
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opportujnity to apply for. dissemination funds are appreciated, we 
believe, by competent evaluators and program developers. 

Such a system has been operating with some success within OE and 
N1E. The JDRP reviews educational products, basing review on 
evidence that conforms to articulated standards. Approval makes them 
eligible for money earmarked for expansion, dissemination, and other 
purposes. 

i > « 

Explicit Policy on Independence jf Evaluation 

. ' ' ? 

There is no substitute at the national, state, or local level for policy on 

relative independence of the evaluator. Such policy can assure bureau- 
cratic independence, notably eliminating clearance requirements for 
conversation or disclosure of reports to any group. It may involve 
administrative independence, notably by assuring that the evaluator 
report to an individual other than the program manager. It may involve 
fiscal independence, notably by assuring that funds earmarked for 
evaluation are ehannfiledtftrough the evaluation unit, by setthig salaries 
for the unit independent.^ salaries for program operating units>ajid by ' 
other methods. It may involve political independence,' for example, 
through the bipartisan .approval of director of evaluation in. the same 
spirit as appointments are approved for Inspector General and Comp- 
troller General at the national level of government. , 



EVALUATOR CAPABILITIES » " 

The primary reasons for suggesting that demands for evaluation be 
preceded by "capabilities as^essment^particularly atf he state and local 
levels, arp'.discussed below. * * 

First, identifying who is and who Is hot an evaluator (qot to mention 
the appropriate 'competency level) is often difficult. Depending on the 
program and the assigned tasks, program staff, evaluation unit staff, ' 
outside contractors, or graduate students may have Valuation responsi- 
bility. Second, because the field is less than fifteen years old, few 
institutions offer-formal certification in the^area. There is*considerable 
debate about training, and graduate curricula vary in emphasis across 
institutions. 

More important, the skills and talents required of evaluators in 
LEAs and SEAs differ, depending on evaluation activity. >^hqn eval- 
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uation involves simply meeting minimal reporting requirements, the 
skills demanded do not require advanced graduate training, but some 
technical common sense is essential. When evaluation activities go 
beyond the minimum reporting requirements, the level and sophistica- 
tion of required skills multiply quickly. These two types of activities and 
their capability demands should receive separate consideration inlaw","" 
regulation, and evaluation policy. 

By capabilities assessment we mean systematic attempt to describe 
the kinds of skills that are required for each kind of task. For national 
demands for evaluative information, this may involve intensive field 
. research — task analyses— of good performers. It need not be elaborate, 
however. Observing what people do is better, but niore expensive of 
course, than merely asking them what they do. 

Meeting Federal Evaluation' ... . , 

Reporting Requirements ' „ 

It cannot be expected that all state and local education agencies have 
the capabilities necessary to adequately comply with federal evaluation 
reporting requirements. Often, program staff. in these agencies- 
individuals with responsibilities other than evaluation — assume respon- 
sibility for reporting aqtivities. These persons werp not necessarily hired 
for their evaluation expertise. Consequently, technical assistance in 
evaluation should be provided so that agencies can -adequately fulfill 
federal evaluation reporting requirements. It might be provided in a 
variety of ways; (1) At'the minimum, the sponsoring agency should have 
direct access to evaluation unit staff with explicit responsibility for 
training in evaluation-. These individuals can 'develop appropriate 
guidelines for evaluation, arrange evaluation workshops for individuals 
* who must complete these requirements, and select the proper strategy 
for providing technical assistance. Federal program agencies without 
these resources should consider creating specific job positions in 
evaluation. (2) Adequate resources can be channeled to SEAs that 
administer these federal grant programs to permit thenvto provide easily 
accessible and expert technical assistance in evaluation. (3) Federally 
supported technical assistance centers, such as those existing for Title I 
evaluation, can be established to assist states and local education 
agencies in meeting federal reporting requirements. One approach is to 
expand services of the Title I TACs to provide evaluation assistance fop, 
other federal programs. V \ * / 



34 EVALUATION REVIEW / FEBRUARY 1983 



Technical assistance involves instruction and guidance in the actual 
conduct of evaluation; i.e., selection of program participants, use of 
tests, and completion of federal reporting, forms. It also involves 
^assistance indecidingwho will evaluate! For example, districts that have 
capable evaluation units should be encouraged to use the services of the 
unit fo T^ii^TOgrarn evaluation neeflsTSi nal l dis tifctsrthai-do-fiot-^avc- 
the resources to form their own research and evaluation unit may be 
instructed in other options, e.g., the formation of a consortium to hire 
competent evaluation staff who serve more than one district. Regional 
assistance centers can be -developed or augmented in order to better 
provide technical assistance in evaluation. When outside contractors ar^e 
employed, guidelines must be developed so that program staff and 
district selection boards can choose the most competent individuals, be 
sensitive to the types of skills required, and be aware of their rights in 
contractual arrangements. State guidelines of this kind are rare. 

Going Beyond Federal Evaluation 
Reporting Requirements 

Some districts and states often attempt to go beyond federal 
reporting requirements. If competently executed, these evaluations can 
improve^the quality of information submitted to federal agencies, 
Congress, and to such other audiences as Parent Advisory Councils and 
school boards. We believe that providing more opportunities to those 
LEAs and SEAs with interest and capabilities in evaluation is war- 
' ranted. At the state level this can be accomplished through existing 
mechanisms, such as the monies targeted for improving state capabili- 
ties and state refinement grants for Title devaluation supported by 
Section 183(c). These'funding rnechanisrns should be "supported. Dis- 
semination* of demonstrated improvements in evaluation practices 
developed by SEAs through contracts should be promoted. 

The improvement cf local education agency capabilities deserves 
more attention than itjias received in terms of discretionary evaluation 
activities. While some of this cah be accomplished through an expanded 
SEA role, other methods can be more specifically targeted at LEAs and 
supported directly from the federal government. One option is to 
expand the prograpn of direct grants or contracts to LEAs for 
evaluation-related activities. This should allow LEAs to' apply for and 
receive funds to engage in additional evaluation activities for federal 
programs or in research on ways to improve evaluation methods. A 
second option is to make available grants to LEAs/ SEAs to foster 
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university LEA relationships. This might include funding for training 
programs jointly sponsored by academic institutions and LEAs/SEAs. 
This would not only provide training for agency personnel but would 
also Improve the quality of evaluation programs in universities by 
allowing students to participate in actual evaluations. ,<fn addition, 
university-conducted workshops could be supported as an avenue of 

continuing educatt^TrjTTreti^ 

be an opportunity to award matching monies for SEA/ LEA investment 
itmuch arrangements as "an endowed chair," whereby university faculty 
Jftfrftpend a period of time in these agencies conducting evaluations and 
designing procedures that wjll remain after their departure. 

f 

NOTES 

1. The full report has been issued as R. F. Boruch and D.S. Cordray (eds.). An 
appraisal of educational program evaluations: Federal. State, and local agencies. Report 
to the Congress. Psychology Department, Northwestern University, Evanston, Illinois, 
1980. It is available through the ERIC system, from the Office of Evaluation at the U.S. 
Department of Education, and the U.S. Government PrintingOffice(Document Number 

M980 0-721-636/236). This article is a revision of Chapter 7 of the report. Revisions were 
based on reactions to presentations of this material (after the report was submitted) at the 
annual meeting of the Evaluation Research Society, the Northern Illinois Association for 
Research and Evaluation, the State of Illinois Department of Education, the CRS, the 
CAO, and elsewhere. The revisions are important, but they do not represent major devia- 
tions from the original text. fc 

2. For brevity's sake, we omit references in this paper. See the full report for 
references. For a literature review, we refer you to R. F, Boruch and P. M. Wortraan. 1 
Implications of educational evaluation for evaluation policy. Review of Research, in 
Education, 1979, 7, 309-363. 
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